Skip to main content
The Simulation Engine lets you test your agent against synthetic conversations before deploying to real users. It generates realistic test inputs based on your agent’s role and goal, runs them against the live agent, and scores the results across quality and safety metrics.

How it works

  1. Open an agent and go to Safety and Evaluations > Simulation Engine.
  2. Define or auto-generate scenarios: situations the agent should handle. Examples: “angry customer demanding a refund,” “user asking an out-of-scope question.”
  3. Define or auto-generate personas: user types the agent will encounter. Examples: “non-technical user,” “enterprise decision-maker,” “hostile adversarial user.”
  4. The engine combines scenarios and personas into test cases automatically.
  5. Run the simulation. The engine executes each test case and scores the results.

Scoring metrics

MetricWhat it measures
Task CompletionDid the agent accomplish what the user asked?
HallucinationDid the agent fabricate facts not present in its knowledge?
FaithfulnessIs the response grounded in the connected Knowledge Base?
ToxicityDid the agent produce harmful content?
BiasDid the agent treat any group unfairly?
Tool AccuracyDid the agent call the right tool with the correct arguments?

Agent Hardening

When test cases fail, select them and choose Agent Hardening. The engine analyzes the failure patterns and recommends changes to the agent’s instructions, model selection, or feature configuration (for example, enabling Reflection for an agent that is hallucinating). Review the recommendations, apply them to the agent, and re-run the simulation to confirm improvement.

Before going to production

Run the Simulation Engine until the agent meets your quality bar. A reasonable threshold for most production agents is 90% or higher task completion, zero toxicity failures, and a hallucination rate below your acceptable limit with all tool calls producing correct outputs. The Simulation Engine is the primary quality gate before promoting any agent to a production environment.

Next steps